Search CORE

26 research outputs found

Erratum à l'article / Erratum to the article “The Kazusa cDNA project for identification of unknown human transcripts” [C. R. Biologies 326 (2004) 959–966]

Author: Osamu Ohara
Reiko Kikuno
Takahiro Nagase
Publication venue
Publication date: 01/01/2004
Field of study

Comptes Rendus Biologies (CRBIOL)

CR Biologies

The Kazusa cDNA project for identification of unknown human transcripts

Author: Osamu Ohara
Reiko Kikuno
Takahiro Nagase
Publication venue
Publication date: 01/01/2003
Field of study

Comptes Rendus Biologies (CRBIOL)

CR Biologies

GeneWaltz--A new method for reducing the false positives of gene finding

Author: AG Clark
AL Hughes
C Burge
ES Lander
G Parra
I Korf
IM Meyer
IM Meyer
J Shendure
J Wang
JC Venter
K Misawa
Kazuharu Misawa
L Stein
L Zhang
M Stanke
MR Brent
Reiko F Kikuno
RH Waterston
S Karlin
SF Altschul
SJ Jones
W Makalowski
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Identifying protein-coding regions in genomic sequences is an essential step in genome analysis. It is well known that the proportion of false positives among genes predicted by current methods is high, especially when the exons are short. These false positives are problematic because they waste time and resources of experimental studies. Methods We developed GeneWaltz, a new filtering method that reduces the risk of false positives in gene finding. GeneWaltz utilizes a codon-to-codon substitution matrix that was constructed by comparing protein-coding regions from orthologous gene pairs between mouse and human genomes. Using this matrix, a scoring scheme was developed; it assigned higher scores to coding regions and lower scores to non-coding regions. The regions with high scores were considered candidate coding regions. One-dimensional Karlin-Altschul statistics was used to test the significance of the coding regions identified by GeneWaltz. Results The proportion of false positives among genes predicted by GENSCAN and Twinscan were high, especially when the exons were short. GeneWaltz significantly reduced the ratio of false positives to all positives predicted by GENSCAN and Twinscan, especially when the exons were short. Conclusions GeneWaltz will be helpful in experimental genomic studies. GeneWaltz binaries and the matrix are available online at <url>http://en.sourceforge.jp/projects/genewaltz/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Relationship between amino acid composition and gene expression in the mouse genome

Abstract Background Codon bias is a phenomenon that refers to the differences in the frequencies of synonymous codons among different genes. In many organisms, natural selection is considered to be a cause of codon bias because codon usage in highly expressed genes is biased toward optimal codons. Methods have previously been developed to predict the expression level of genes from their nucleotide sequences, which is based on the observation that synonymous codon usage shows an overall bias toward a few codons called major codons. However, the relationship between codon bias and gene expression level, as proposed by the translation-selection model, is less evident in mammals. Findings We investigated the correlations between the expression levels of 1,182 mouse genes and amino acid composition, as well as between gene expression and codon preference. We found that a weak but significant correlation exists between gene expression levels and amino acid composition in mouse. In total, less than 10% of variation of expression levels is explained by amino acid components. We found the effect of codon preference on gene expression was weaker than the effect of amino acid composition, because no significant correlations were observed with respect to codon preference. Conclusion These results suggest that it is difficult to predict expression level from amino acid components or from codon bias in mouse.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

Author: Amid Clara
Apweiler Rolf
Ashurst Jennifer
Auffray Charles
Barrero Roberto A
Bellgard Matthew
Bonaldo Maria de Fatima
Bono Hidemasa
Bromberg Susan K
Brookes Anthony J
Bruford Elspeth
Carninci Piero
Chakraborty Ranajit
Chelala Claude
Chen Zhu
Couillault Christine
Debily Marie-Anne
Devignes Marie-Dominique
Dubchak Inna
Endo Toshinori
Estreicher Anne
Eveno Eric
Eyras Eduardo
Fujii Yasuyuki
Fukami-Kobayashi Kaoru
Fukuchi Satoshi
Go Mitiko
Gojobori Takashi
Gough Craig
Graudens Esther
Hahn Yoonsoo
Han Michael
Han Ze-Guang
Hanada Kousuke
Hanaoka Hideki
Harada Erimi
Hashimoto Katsuyuki
Hayashizaki Yoshihide
Hide Winston
Hilton Phillip
Hinz Ursula
Hirai Momoki
Hirakawa Mika
Hishiki Teruyoshi
Homma Keiichi
Hopkinson Ian
Ikeo Kazuho
Imanishi Tadashi
Imbeaud Sandrine
Inoko Hidetoshi
Isogai Takao
Itoh Takeshi
Jia Libin
Jin Lihua
Kanapin Alexander
Kanehisa Minoru
Kaneko Yayoi
Karavidopoulou Youla
Kasprzyk Arek
Kasukawa Takeya
Kelso Janet
Kersey Paul
Kikuno Reiko
Kim Sangsoo
Kimura Kouichi
Korn Bernhard
Koyanagi Kanako O
Kuryshev Vladimir
Lenhard Boris
Makalowska Izabela
Makalowski Wojciech
Makino Takashi
Mano Shuhei
Mariage-Samson Regine
Mashima Jun
Matsuda Hideo
Mewes Hans-Werner
Minoshima Shinsei
Miyazaki Satoru
Mulder Nicola
Nagai Keiichi
Nagasaki Hideki
Nagata Naoki
Nakai Kenta
Nakao Mitsuteru
Nigam Rajni
Nishikawa Ken
Nishikawa Tetsuo
Nomura Nobuo
O'Donovan Claire
Ogasawara Osamu
Ohara Osamu
Ohtsubo Masafumi
Oishi Michio
Okada Norihiro
Okazaki Yasushi
Okido Toshihisa
Okubo Kousaku
Oota Satoshi
Ota Motonori
Ota Toshio
Otsuki Tetsuji
Piatier-Tonneau Dominique
Poustka Annemarie
Quackenbush John
R. Gopinath Gopal
Ren Shuang-Xi
Richard Roberts
Saitou Naruya
Sakai Hiroaki
Sakai Katsunaga
Sakaki Yoshiyuki
Sakamoto Shigetaka
Sakate Ryuichi
Schupp Ingo
Servant Florence
Sherry Stephen
Shiba Rie
Shimizu Nobuyoshi
Shimoyama Mary
Simpson Andrew J
Soares Bento
Souza Sandro J. de
Steward Charles
Stodolsky Marvin
Strausberg Robert L
Sugano Sumio
Sugawara Hideaki
Suwa Makiko
Suzuki Mami
Suzuki Yoshiyuki
Suzuki Yutaka
Takagi Toshihisa
Takahashi Aiko
Takeda Jun-ichi
Tamiya Gen
Tamura Takuro
Tanaka Hiroshi
Tanaka Susumu
Tanino Motohiko
Tateno Yoshio
Taylor Todd
Terwilliger Joseph D
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thomas Michael A
Tonellato Peter
Unneberg Per
Veeramachaneni Vamsi
Wagner Lukas
Watanabe Shinya
Wiemann Stefan
Wilming Laurens
Yamaguchi-Kabata Yumi
Yamasaki Chisato
Yasuda Norikazu
Yasuda Tomohiro
Yoo Hyang-Sook
Yura Kei
Publication venue: Public Library of Science
Publication date: 01/01/2004
Field of study

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

Research Repository

Hokkaido University Collection of Scholarly and Academic Papers

UPF Digital Repository

White Rose Research Online

MPG.PuRe

Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

Author: Amid Clara
Ashurst Jennifer
Barrero Roberto A.
Bellgard , Matthew
Bono Hidemasa
Bromberg Susan K.
Brookes Anthony J.
Bruford Elspeth
Carninci Piero
Chelala Claude
Couillault Christine
de Fatima Bonaldo Maria
de Souza Sandro J.
Debily Marie-Anne
Devignes Marie-Dominique
Dubchak Inna
Endo Toshinori
Estreicher Anne
Eveno Eric
Eyras Eduardo
Fujii Yasuyuki
Fukami-Kobayashi Kaoru
Fukuchi Satoshi
Gopinath Gopal R.
Gough Craig
Graudens Esther
Hahn Yoonsoo
Han Michael
Han Ze-Guang
Hanada Kousuke
Hanaoka Hideki
Harada Erimi
Hashimoto Katsuyuki
Hilton Phillip
Hinz Ursula
Hirai Momoki
Hirakawa Mika
Hishiki Teruyoshi
Homma Keiichi
Hopkinson Ian
Ikeo Kazuho
Imanishi Tadashi
Imbeaud Sandrine
Inoko Hidetoshi
Itoh Takeshi
Jia Libin
Jin Lihua
Kanapin Alexander
Kaneko Yayoi
Karavidopoulou Youla
Kasprzyk Arek
Kasukawa Takeya
Kelso Janet
Kersey Paul
Kikuno Reiko
Kim Sangsoo
Kimura Kouichi
Korn Bernhard
Koyanagi Kanako O.
Kuryshev Vladimir
Lenhard Boris
Makalowska Izabela
Makino Takashi
Mano Shuhei
Mariage-Samson Regine
Mashima Jun
Matsuda Hideo
Mewes Hans-Werner
Minoshima Shinsei
Miyazaki Satoru
Mulder Nicola
Nagai Keiichi
Nagasaki Hideki
Nagata Naoki
Nakao Mitsuteru
Nigam Rajni
Nishikawa Tetsuo
O'Donovan Claire
Ogasawara Osamu
Ohara Osamu
Ohtsubo Masafumi
Okada Norihiro
Okido Toshihisa
OOta Satoshi
Ota Motonori
Ota Toshio
Otsuki Tetsuji
Piatier-Tonneau Dominique
Poustka Annemarie
Ren Shuang-Xi
Saitou Naruya
Sakai Hiroaki
Sakai Katsunaga
Sakamoto Shigetaka
Sakate Ryuichi
Schupp Ingo
SERVANT Florence
Sherry Stephen
Shiba Rie
Sugano Sumio
Suzuki Yoshiyuki
Suzuki Yutaka
Takeda Jun-Ichi
Tamura Takuro
Tanaka Susumu
Tanino Motohiko
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thomas Michael, A.
Yamaguchi-Kabata Yumi
Yamasaki Chisato
Yasuda Tomohiro
Yura Kei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2004
Field of study

publication en ligne. Article dans revue scientifique avec comité de lecture. nationale.National audienceThe human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

INRIA a CCSD electronic archive server

Protein–Protein Interactions Between Large Proteins: Two-Hybrid Screening Using a Functionally Classified Library Composed of Long cDNAs

Author: Kikuno Reiko
Nakayama Manabu
Ohara Osamu
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/11/2002
Field of study

Large proteins have multiple domains that are potentially capable of binding many kinds of partners. It is conceivable, therefore, that such proteins could function as an intricate framework of assembly protein complexes. To comprehensively study protein–protein interactions between large KIAA proteins, we have constructed a library composed of 1087 KIAA cDNA clones based on prior functional classifications done in silico. We were guided by two principles that raise the success rate for detecting interactions per tested combination: we avoided testing low-probability combinations, and reduced the number of potential false negatives that arise from the fact that large proteins cannot reliably be expressed in yeast. The latter was addressed by constructing a cDNA library comprised of random fragments encoding large proteins. Cytoplasmic domains of KIAA transmembrane proteins (>1000 amino acids) were used as bait for yeast two-hybrid screening. Our analyses reveal that several KIAA proteins bearing a transmembrane region have the capability of binding to other KIAA proteins containing domains (e.g., PDZ, SH3, rhoGEF, and spectrin) known to be localized to highly specialized submembranous sites, indicating that they participate in cellular junction formation, receptor or channel clustering, and intracellular signaling events. Our representative library should be a very useful resource for detecting previously unidentified interactions because it complements conventional expression libraries, which seldom contain large cDNAs. [Interaction data accession numbers are BIND ID 12487–12570. Supplemental material is available online at http://www.genome.org.

Crossref

PubMed Central

Method for systematic targeted isolation of homologous cDNA fragments in a multiplex format

Author: Chinault A.C.
Hisashi Koga
Osamu Ohara
Reiko Kikuno
Reiko Ohara
Publication venue: 'Future Science Ltd'
Publication date
Field of study

Crossref